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Abstract 


Modular verification is a technique used to face the state explosion 
problem often encountered in the verification of properties of complex 
systems such as concurrent interactive systems. The modular approach 
is based on the observation that properties of interest often concern a 
rather small portion of the system. As a consequence, reduced models 
can be constructed which approximate the overall system behaviour 
thus allowing more efficient verification. 

Biochemical pathways can be seen as complex concurrent interactive 
systems. Consequently, verification of their properties is often com- 
putationally very expensive and could take advantage of the modular 
approach. 

In this paper we develop a modular verification framework for 
biochemical pathways. We view biochemical pathways as concurrent 
systems of reactions competing for molecular resources. A modular 
verification technique could be based on reduced models containing 
only reactions involving molecular resources of interest. 

For a proper description of the system behaviour we argue that 
it is essential to consider a suitable notion of fairness, which is a 
well-established notion in concurrency theory but novel in the field of 
pathway modelling. The fairness notion we consider forbids starvation 
of reactions, namely it ensures that a reaction that is enabled infinitely 
often cannot always occur to the detriment of another infinitely often 
enabled reaction causing the latter to never occur. 

We prove the correctness of the approach and demonstrate it on the 
model of the EGF receptor-induced MAP kinase cascade by Schoeberl 
et al. 
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1 Introduction 


A big challenge of current biology is understanding the principles and func- 
tioning of complex biological systems. Despite the great effort of molecular 
biologists investigating the functioning of cellular components and networks, 
we still cannot provide a detailed answer to the question “how a cell works?”. 

In the last decades, scientists have gathered an enormous amount of 
molecular level information. To uncover the principles of functioning of 
a biological system, just collecting data does not suffice. Actually, it is 
necessary to understand the functioning of parts and the way these interact 
in complex systems. The aim of systems biology is to build, on top of the 
data, the science that deals with principles of operation of biological systems. 
The comprehension of these principles is done by modelling and analysis 
exploiting mathematical means. 

A typical scenario of modelling a biological system is as follows. To 
build a model that explains the behaviour of a real biological system, first 
a formalism needs to be chosen. Then a model of the system is created, 
simulation is performed, and the behaviour is observed. The model is 
validated by comparing the results with the real experiments. Simulation 
allows not only validation of laboratory experiments, but also prediction of 
behaviour under new conditions. 

Depending on the considered simulation technique, simulation can 
give either the average system behaviour or a number of possible system 
behaviours. This may be insufficient when one is interested in analysing 
all the behaviours of a system. In these cases model checking may be of 
help. This technique permits the verification of properties (expressed as 
logical formulae) by exploring all the possible behaviours of a system. This 
analysis technique typically relies on a state space representation whose 
size, unfortunately, makes the analysis often intractable for realistic models. 
This is true in particular for systems of interest in systems biology (such 
as metabolic pathways, signalling pathways, and gene regulatory networks), 
which often consist of a huge number of components interacting in different 
ways, thus exhibiting very complex behaviours. 

Many formalisms originally developed by computer scientists to model 
systems of interacting components have been applied to biology, also with 
extensions to allow more precise descriptions of the biological behaviours 
[2, 6, 10, 14, 35, 36]. Examples of well-established formal frameworks that 
can be used to model, simulate and model check descriptions of biological 
systems are [10, 24, 27]. 
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Model checking techniques have traditionally suffered from the state 
explosion problem. Standard approaches to the solution of this problem 
are based on abstractions or similar model reduction techniques (e.g. [11]). 
Moreover, the use of Binary Decision Diagrams (BDDs) [12] to represent 
the state space (symbolic model checking) often allows significantly larger 
model to be treated [5]. 


A method for trying to avoid the state space explosion problem is to 
consider a decomposition of the system, and to apply a modular verification 
technique allowing global properties to be inferred from properties of the 
system components. This approach can be particularly efficient when the 
modelled systems consist of a high number of components, whereas properties 
of interest deal only with a rather small subset of them. This is often the case 
for properties of biological systems. Hence, for each property it would be 
useful to be able to isolate a minimal fragment of the model that is necessary 
for verifying such a property. If such a fragment can be obtained by working 
only on the syntax of the model, the application of a standard verification 
technique on the semantics of the fragment avoids the state explosion. 


In previous work we developed a modular verification technique in which 
the system of interest is described by means of a general automata-based 
formalism, called sync-programs, suitable for qualitative description of a 
large class of biological systems [19, 20]. Sync-programs include a notion of 
synchronisation that enables the modelling of biological systems and support 
modular construction of models. The modular verification technique is based 
on property preservation and allows the verification of properties expressed 
in the temporal logic ACTL™ [26] to be verified on fragments of models. 
In order to handle modelling and verification of more realistic biological 
scenarios, we have proposed a dynamic version of our formalism along with 
an extension of the modular verification framework [18]. 


The long-term aim of our research is the development of an efficient 
modular verification framework specifically designed for biochemical path- 
ways, and of a pathway analysis tool based on such a framework. At the first 
stages of the development of the modular verification framework we faced 
some problems the solution of which required the definition of concepts re- 
lated to the formal modelling of biochemical pathways that we believe could 
be interesting not only in the context of modular verification. In particular, 
we defined a notion of fairness for biochemical pathways and a notion of 
molecular component of a pathway. The former is a well-known concept in 
concurrency theory that could be useful to describe more accurately the 
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dynamics of a pathway (in a qualitative framework). The latter is a notion 
relating species involved in the same pathway such that two species are 
considered to be part of the same molecular component if they can be seen 
as different states of the same molecule. As far as we know, the adoption of 
a notion of fairness in the context of biology is new. On the other hand, the 
notion of molecular component has been often implicitly used (for instance 
in the modelling of biological systems by means of automata), but now we 
provide new insight on this notion. 


In this paper we report preliminary results obtained during the devel- 
opment of the modular verification framework. Modular verification requires 
either adopting a modular notation for pathway modelling or finding a way 
to decompose a pathway, simply expressed as a set of biochemical reactions, 
into a number of modules. The approach that we choose to follow is in 
between these two alternatives. Actually, we assume the pathway to be 
expressed as a set of reactions in a “normal-form” satisfying some modulari- 
sation requirements, and then we define a modularisation procedure that 
allows modules to be inferred from reactions. In a recent paper [33] we 
have also defined a semi-automatic algorithm that allows any pathway to be 
transformed into normal-form. Hence, such an algorithm would allow our 
modular verification approach to be applied to any pathway. 


Modules inferred from reactions will be molecular components, hence 
our modularisation procedure will allow us to consider a pathway not only 
as a set of reactions, but also as a set of entities interacting with each other 
(through reactions) and consequently changing state. 


Once the molecular components of a pathway are identified, we can use 
them to decompose the verification of a global pathway property into the 
verification of a number of sub-properties related with groups of components. 
To this aim we define a projection operation that allows a model fragment 
describing the behaviour of a group of components to be obtained from a 
model describing the whole pathway. Such a projection operation is actually 
an abstraction function, since the behaviour of the group of components 
will be over-approximated (i.e. the model will include behaviours that are 
not present in the model of the whole pathway). By considering a suitable 
temporal logic for the specification of properties (namely ACTL’ , a fragment 
of the CTL logic consisting only of universally quantified formulae) we can 
prove that properties holding in model fragments obtained by projection 
also hold in the complete model of the pathway. 


Nothing can be said of properties that do not hold in a model fragment. 
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They might be false in the model fragment since they were also false in the 
original model, or they can be false in the model fragment because some 
behaviours added by the projection operation violates them when they were 
true in the original model. In case a property turns out to be false in a model 
fragment obtained by projection, it is sometimes possible to assess whether 
the property really does not hold in the original model by verifying some 
stronger negative property. We will show some examples of this approach. 

In order to verify properties of complete pathway models or of model 
fragments it is possible to translate them into the input language of an 
existing model checking tool. Specifically, we use the NuSMV model checker 
[9], which is a well-established and efficient instrument. 

We demonstrate the modular verification approach on the model of the 
EGF receptor-induced MAP kinase cascade by Schoeber! et al. [37] and we 
discuss how we plan to continue the development of the approach to improve 
its efficiency. 


Related Work 


This paper is a revised extended version of [22]. The main improvement with 
respect to the previous version is a careful revision of the formal definition 
of the modular verification framework. In particular, we have revised the 
formal definition of the semantics both at the concrete (complete models) 
and abstract (projections) levels. Such a revision was aimed at avoiding 
unnecessary transitions as much as possible, and hence at reducing as much 
as possible the approximation in the abstract semantics. In addition, in 
order to improve presentation of concepts we changed the structure of the 
paper and added a number of examples. Finally, we included this section in 
which our approach is compared with related work. 

In [32] a reduction methodology for logical models of regulatory networks 
is proposed. The logical formalism is a framework that allows regulatory 
networks to be described as graphs where nodes are genes and arcs are 
interactions between genes (promotion/inhibition). The behaviour of a 
network can then be described as a state transition graph in which states 
describe the expression level of each modelled gene and transitions describe a 
changes in such expression levels due to interactions. The proposed reduction 
methodology essentially consist in iteratively removing individual nodes by 
defining bypass interactions from the regulators to the targets of each of 
them. The logical formalism has also been translated into Petri nets in [8]. 
The translation encodes a node of a graph as a pair of places in the net, gene 
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interactions as net transitions and the expression level of the genes as the 
marking of the net. Model reductions are defined also at the net level and 
are aimed at removing unnecessary transitions. 


Our modular verification framework is partially related with the ap- 
proach based on logical modelling. Both approaches are based on formal 
qualitative discrete models of biological networks, and both aims at verifying 
behavioural properties. The approaches in [32, 8], however, are specific for 
gene regulatory networks, whereas our approach is more general since it 
aims at describing also reactions occurring between proteins, usually in the 
context of metabolic and signalling pathways. Moreover, both approaches 
face the problem of model reduction. The reduction approach we follow 
is similar, in principle, to the one in [32]. Namely, in both cases we have 
a model that is reduced by somehow removing parts of it describing some 
biological entities. However, the fact that the logical framework is specifically 
designed for regulatory networks makes the operation easier in that case (it 
suffices to remove one node from a graph and adapt the involved edges). In 
the case of our modular verification framework, removing parts from the 
model is more complex since the described biological entities (e.g. proteins) 
usually have a much more complex behaviour than genes (as they are seen 
in regulatory networks). 


In [16, 17] biological networks are described and analysed by means of 
a discrete theoretical framework in which biological entities are modelled as 
agents that can change state depending on states of the other agents. The 
framework can be used to study different kinds of properties (asymptotic 
dynamics, causality properties, etc...) of different kinds of biological networks. 
In particular, in [17] modularity of interaction networks is considered by 
studying the conditions for module formation and by characterising the 
relations between the global behaviour of a network and the local behaviours 
of its components. The approach in [17] has aims similar to the ones of 
our approach. However, [17] focuses on asymptotic dynamics of networks, 
whereas our approach deals with behavioural properties expressed as modal 
logic formulae. Moreover, the nature of module in [17] is different from that 
of molecular component in our framework since the former is essentially 
a portion of the network, whereas the latter is the representation of an 
individual biological entity. 


In [3] conditions are investigated for the preservation of the behaviour 
of a regulatory network when it is embedded into a larger network. Hence, 
when such conditions are proved for a subnetwork of a larger regulatory 
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network we obtain that any property of the subnetwork can be verified on the 
subnetwork model rather than on the (larger) model of the whole regulatory 
network. This makes the approach in [3] related with ours. However, our 
approach is more general for two reasons: (i) it is not specific for regulatory 
networks, and (ii) it is not limited to the cases of components (similar to 
subnetworks) whose behaviour is completely independent from the rest of 
the system. What really matters in our case is that only the properties to 
be verified (and not the whole component behaviour) does not depend on 
the behaviour of the rest of the system. 

Feret et al. [13, 25] developed a reduction technique for pathway models 
based on an abstraction technique that groups together (fragments of) species 
representing different configurations of the same molecular complexes the 
behaviour of which is the same. Such an approach can significantly reduce 
the size of models by avoiding the combinatorial blow up of species. The 
approach proposed by Feret et al. is completely different from ours so that 
in principle it might be possible to combine the two approaches. 


2 Modelling Notation for Biochemical Pathways 


In biochemistry, metabolic pathways are networks of biochemical reactions 
occurring within a cell. The reactions are connected by their intermediates: 
products of one reaction are substrates for subsequent reactions. Reactions 
are influenced by catalysts and inhibitors, which are molecules (proteins) 
which can stimulate and block the occurrence of reactions, respectively. For 
the sake of simplicity we do not consider inhibitors in this paper, although 
they could be easily dealt with. 

Given a set of species S’, let us assume biochemical reactions constituting 
a pathway to have the following form: 


Pi gxteatg > Digests Dil (lips tsy Gat 


where r;,p; and c;, for suitable values of 7, are all in S. We have that js are 
reactants, p;s are products and c;s are catalysts of the considered reaction. 
Given a reaction R we define re(R) = {r1,..., Tn}, pro(R) = {pi,..-, Pn}, 
and cat(R) = {c1,...,¢m}. We denote the set of species involved in reaction 
R as species(R) = re(R) U pro(R) U cat(R). The set of all reactions over a 
given set of species S is denoted by reactions(S). Finally, a pathway is a set 
of reactions, P = {R1,..., Rn} C reactions(S'). Given a pathway P, we can 
infer the set of species involved in it as species(P) = Upep species(R). 
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Note that, in the present paper, we use sets rather than multisets in 
reactions. This is done since, as discussed in the next section, we choose 
to describe pathways at a very high level of abstraction. Moreover, using 
sets allows us to simplify the presentation of our approach. Note that all of 
the concepts we will define could be defined by using multisets in place of 
simple sets and the results would hold even in such a case. 


2.1 Semantics 


In general, the dynamics of a pathway can be described at several different 
levels of abstraction. The most precise level consists of a quantitative 
description in which quantities (or concentrations) of species are taken into 
account, as well as reaction rates in either a deterministic or a stochastic 
framework. At amore abstract level reaction rates can be ignored. Ultimately, 
also quantities of species can be ignored by considering only their presence (or 
absence) in the considered biochemical solution. The less abstract description 
level is obviously the most precise, but also the most difficult to treat with 
formal analysis techniques. The more abstract levels are more suitable for 
the application of formal analysis techniques and are often precise enough 
to provide useful information on the role of the species and of the reactions 
involved in the pathway. We choose to adopt the most abstract description 
level, and hence we define a qualitative formal semantics of pathways in 
which species can only be either present or absent. This is a rather common 
choice, done for instance also in [15, 4]. 

Starting from the initial state representing a biochemical solution, the 
dynamics of the pathway is driven by the reactions. The occurrence of a 
reaction may cause the appearance of some new species in the biochemical 
solution. In this paper, we choose to interpret the effect of a reaction 
depending on whether it is catalysed or not. In particular, the application 
of a reaction always creates the products, but we choose reactants to be 
consumed only by catalysed reactions. In other words, a reaction without 
catalysts creates the products but does not consume the reactants, while 
a reaction with catalysts creates the products and consumes the reactants. 
We choose this interpretation since non-catalysed reactions usually reach a 
steady-state of dynamic equilibrium in which both reactants and products 
are present in the biochemical solution. On the other hand, a reaction 
favoured by catalysts usually tends to be performed as long as there are 
reactants. A consequence of this assumption is that a reversible reaction in 
which both directions are catalysed, which frequently occurs in biological 
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re(R) Cs pro(R)Zs_ cat(R) =0 


= (no-cat) 
s — sUpro(R) 


re(R) Cs re(R) £ pro(R)V pro(R) Zs OF cat(R) Cs 


zs (cat) 
s — (s\re(R)) U pro(R) 


VR. s 3 


s-s 


(deadlock) 


Figure 1: Inference rules of the semantics. 


pathways, oscillates between two states. This is realistic in some cases, such 
as in the case of oscillatory behaviours, but not always. We leave a more 
detailed treatment of this aspect as future work. 

Technically, the semantics is formalised as Labelled Transition System 
(LTS), where each state corresponds to a set of species, and transitions 
are labelled by the reaction which is applied. Formally, an LTS is a tuple 
(S,X,—), where S is the set of states, 5 is the set of labels, and >C SxUxS 
is the transition relation. 


Definition 1 (Semantics). Let P = {Rj,...,Ryz} be a pathway, and let 
e¢ P. The semantics of P is defined as the LTS (P(species(P)), PU{e}, >), 
where — is the least transition relation satisfying the inference rules of 
Figure 1. 


Rules (no-cat) and (cat) formalise the dynamics of reactions in the 
absence and presence of catalysts, respectively. As regards non-catalysed 
reactions, they are applicable only if the reactants are present, and some 
products is missing. In this manner, since the resulting state is obtained 
from the starting state by adding the products, the application of reactions 
which would not change the state of the system are omitted. On the other 
hand, in order for a catalysed reaction to occur, all of its reactants and 
catalysts are required to be present. The resulting state is obtained by first 
removing the reactants (which are thus consumed) and then adding the 
products. Similarly to the previous case, only transitions causing a state 
change are considered, as ensured by condition re(R) £Z pro(R)V pro(R) Zs. 
Alternative combinations of catalysts that may enable the reaction should be 
modelled as different reactions having the same reactants and products. In 
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general, excluding transitions which do not change the state of the system is 
convenient for the verification as the size of the transition system is smaller 
but the set of properties that hold stays the same. 

Finally, rule (deadlock) provides each state in which no reaction is 
enabled (denoted a ) with a self-looping transition with empty label, 
denoted ¢. This rule was not present in [22], and its aim is simply to turn 
every finite path into an equivalent infinite one in order to simplify the 
definition of the modular verification methodology. 

In the following, we denote the semantic function as LTS, namely 
LTS(P) = (P(species(P)),P U {e}, >), for some pathway P. A path in 
LTS(P) is an infinite sequence 7 = so, Ro,s1, Ri, S2,... where for all i > 0, 
Si = Si,1- We denote by 7” the suffix of 7 starting at s,, namely 7* = 
Sk, Re, 8k41, Re41, Sk+2,--+ 


Example 1. Let P be a pathway consisting of the following reactions: 


Pi, P23 C (Ri) PY, M2 > D {Ey} (Rs) 
P, > PY (Ro) D—- Pi, M2 {Bo} (Re) 
Pi > P, (Rs) D-» {C, My} (Rr) 
M, > My {Pr} (Ra) 


The fragment of LTS(P) rooted at {P,, P2,M1, Mo, F1, E2} is shown in 
Figure 2. Note that no transition for reaction R3 is present, since all states 
in which its reactants are present there are also its products (e.g. state sy). 


2.2 Fairness 


In order to describe the behaviour of a pathway more accurately we consider 
a notion of fairness. We motivate it by considering a quantitative system 


consisting of four reactions A Ae {D},B a A {D},A i, 


C {D}andC He A { D }, where ky, ka, kz and ky are the reaction rates. 
By performing the qualitative abstraction, we get a pathway containing 
reactions Rj = A B {D} and Rg =B-> A {D}, R3 = A-C {D} and 
Ry = C — A {D}, whose semantics as defined above includes behaviours 
such as the one where R3 never occurs. Such a behaviour is a qualitative 
abstraction which is not correct, since the standard quantitative dynamics 
ruled by the law of mass action would imply that both R; and Rs occur 
with a frequency proportional to their kinetic rates. Actually, in a stochastic 
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PiPiP, | Fs P,P» R, | PPP 
My MoE; E> Mi DE\E2 | | My DE, E2 
(sa) Rg (ss) (se) 
ve ve 
Ri: (to $11) : . (to 512) Ry 
P,P» R, | PPP, | Bs P,P» R, | PiPt Ps 
M, Mok, E> -—— M, Mek Ee M,DE\ E> | EE A| M,DE\E2 
(so) (s1) Re (s2) (ss) 
Ry Ry Ry Ri Ry 
PPC | p, | pPrPc | | PAPC | pg | PPIPC 
M, Mok, E> | M, MoE E M,DE\E2 | ra M,DE,E2 
(s7) (ss) Re (so) (s1o) 
Ra: (from sa) (from ss) Ra 
Jn Li 
P,PFP.C | fs P,P,C R, | PiPi Pec 
My MoE, E> My DE; E»2 | | My DE, E2 
(si1) (si2) (sis) 
Re 
R 
Rz 7 Rr 
€ 
so ™ 
PPC | Rp, | PPIPC 
Mi Ey E2 ee My Fy, Ee 
(sia) (sis) 


Figure 2: Representation of LT‘S(P) rooted at so = {P,, Po, M1, Mo, F4, F2}, 
where P is as defined in Example 1. Set denotations (brackets and commas) 
are omitted in state representations, whereas each state is associated with a 
short name s;, for i € [0,15]. 
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nT ELTL 8 => ds,R,7r'.t=s,R,7' andsés 
TFEpry af = wor f 

TFprL f Vg = aFryrt f orm Frrtg 

merry, X g = ds, R,n’'. 7 =s,R,q' and mn’ Exzrr g 
mEpry, XR Gg => ds,n’.7=s,R,7' and nm’ Fprt g 
mEpr, ff Ug => dk > 0. a Erry g and 


VO<j<k.mEzprr f 


Figure 3: Satisfaction relation for LTL formulae. 


setting both R, and R3 would occur infinitely often with probability 1. A 
correct qualitative abstraction of our system should therefore only include 
paths in which both R, and Rs occur infinitely many times. 


A concept from concurrency theory that allows to specify the correct 
behaviour is fairness, which stipulates that reactions should compete in a 
fair manner. We consider the well-known notion of strong fairness [23], also 
called compassion, which requires that if a reaction is enabled (ready to 
occur) infinitely many times, then it will occur infinitely many times. 


Technically, fairness is specified by a linear temporal logic [34] (LTL) 
formula. For this reason, we briefly recall such a logic, instantiated to our 
setting, before formally defining fairness. In particular, we consider a variant 
of the LTL with action-specific next modality Xr, where R is a transition 
label, corresponding to a reaction in our setting. Given a finite set species 
S (c.f.r. atomic propositions from canonical definition), the syntax of LTL 
formulae is given by f:=s|af| fVg|Xg|Xrg\|f U g, where f,g are 
meta-variables denoting LTL formulae, s € S, and R is a reaction. Given a 
pathway P, let us consider a Labelled Transition System LTS with states 
P(species(P)) and labels P U {e}; the satisfaction relation Fyprz of an LTL 
formula f with respect to a path 7 € LTS, denoted 7 Frrz f, is defined 
in Figure 3. Additional logical operators can be defined, true = s V 7s, 
false = strue, f\g = 7(-fV-7g) and f > g = =f Vg. Moreover, additional 
temporal operators eventually F g = true U gq, and globally Gg =F 7g, 
are usually defined. 


Definition 2 (Fairness). Let P be a pathway. A path x in LTS(P) is fair 
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iff it satisfies the following LTL formula: 


® = \ (GF enabled(R)) > (GF occurs(R)) 
ReP 


where 


enabled(R) = oe T)A(Acecat(r) ©) tf cat(R) £0, re(R) £ pro(R); 


rere(R) r) A (V ne pro(R) ap) A (Nee cat(R) C) otherwise; 
occurs(R) = Xp true. 


The definition of fairness requires each reaction which is enabled in- 
finitely often in the path (GF enabled(R)) to also occur infinitely often 
(GF occurs(R)). This prevents a component which can progress indefinitely 
to block the application of reactions belonging to other components. 

Given a reaction R, formula occurs(R) is satisfiable only if there is a 
transition exiting from the state labelled by R, denoting the application of 
such a reaction. As regards formula enabled(R), according to the format of 
the reaction R (i.e. either catalysed or non-catalysed) it is satisfiable only 
if the reaction is applicable in the state. First case deals with catalysed 
reactions for which there is at least one reactant which is not also a product. 
In this case, the reaction is enabled if both reactants and catalysts are 
present, irregardless of the products already present in the state (since the 
state is modified by removing the reactants). The second case deals with 
both cases in which either the reaction is not catalysed, or it is catalysed and 
reactants are a subset of the products; in both cases, apart reactants and 
catalysts which need to be present, there must also be at least one product 
missing. 


Example 2. Let P be the pathway defined in Example 1, and LTS(P) be 
its semantics. According to the definition of fairness, the following path is 
fair: 

71 = So, Ro, $1, Rs, 82, Ro, S3, Ri, $10, Ra, $13, Ry, $15,€,815,€,.-.. 


On the other hand, the following path in which the system loops between 
states Ss, and S., is not fair: 


12: = So, Ro, $1, Rs,82, Re, $1, Rs, S2, Re,81, Rs, s2, Re, eae 


Indeed, in m2 reaction Rz is infinitely often enabled (every time the system 
is in state sq) but it occurs only once. In other words reaction Rg, that 
competes with Ro for application in sg, always wins the competition thus 
causing starvation of Rg. 
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3 Identification of Molecular Components 


The approach to modular verification of pathways that we propose is built 
upon the concept of molecular component. The main observation is that 
pathways are usually composed of a few basic biological entities which 
interact; for example, a protein can be involved in a series of transformations, 
starting from its initial synthesised form, which can then be activated and 
later become part of different complexes. 

A typical aspect of a biochemical pathway is that the process described 
involves mainly a few chains of reactions, in which the occurrence of a reaction 
produces some intermediate molecule which is then transformed subsequently 
by other reactions. Intuitively, this allows us to regard the intermediate 
molecular species which appear in the pathway as different “states” or 
“configurations” of the same initial biological entity, and thus the reactions 
as a synchronised state change of a set of such basic entities. According 
to this view, in the modelling of biochemical pathways we can consider a 
notion of molecular component [22, 21] that is the formal counterpart of the 
notion of biological entity. A molecular component thus groups together all 
the species mentioned in the model which correspond to different states of 
the same biological entity. 

In the previous section we have introduced the syntax of the modelling 
formalism of biochemical pathways, in which a reaction is allowed to have 
a different number of reactants and products. In order to identify the 
components of a pathway, we assume that each reaction has the same number 
of reactants and products. Moreover, we assume a positional correspondence 
between the reactants and the products, in particular we assume that product 
p; is the result of the transformation of reactant r; by the reaction. 

The idea behind this assumption on the form of reactions is that 
reactions of cellular pathways very often represent bindings (and unbindings) 
of well-defined macromolecules, such as proteins and genes, to form (or to 
break) complexes either with other macromolecules or with small molecules 
such as ions and nutrients. Also conformational changes are common, in 
which a protein (or a complex constituted by a few proteins) changes its 
own “state”. If we consider a complex not as a single species, but as a 
set of species, one for each molecule involved in it, we have that all of the 
mentioned kinds of reaction turn out to have as many product as reactants. 

In general, pathways are not expressed using this kind of “normal-form” 
reactions, with positional correspondence between reactants and products. 
We argue that, under conditions often found in practice, a pathway can be 
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rewritten in normal form by making explicit the molecular species involved in 
each reaction, thus allowing its decomposition into components. The aspect 
of translating a generic pathway model into a normal-form pathway, has 
been addressed in [33], in which we proposed a semi-automatic algorithm for 
component identification in SBML pathway models. In [30] the algorithm 
has been tested on a vast array of real pathway models expressed by using 
the SBML notation [28], namely on all of the 436 models available in the 
BioModels database [29] under the category “curated models”. The results of 
the test showed that in most cases the algorithm has been able to transform 
the pathways into normal form (and also to infer molecular components) in 
a fully automatic way. 


Example 3. Let P be the pathway defined in Example 1. The corresponding 
pathway in “normal-form” P' consists of the following reactions: 


P,,P2+Cp,,Cp, (Rj) Pi, Mz > Dpx, Dy, {Fi} (Rs) 
Pi > P} (R5) Dp», Du, — Py, Mz {£2} (Re) 
P¥ + P, (R53) Dpx, Da, — Zps,ZMz {Mi,Cp,,Cp,} (Rr) 
M, > Mz {PI} (Ra) 


where species C' representing the complex obtained by the binding of Py with 
Py has been replaced by the two species Cp, and C'p, representing the bound 
states of P, and P2, respectively. Similarly, complex D has been replaced 
(three times) by Dp» and Dy. Moreover, dummy species Zp» and Zu, have 
been introduced to represent the degraded states of Dp» and Duy, respectively. 


3.1 Component Inference from Normal-Form Pathways 


We present an algorithm that given a pathway P infers the components 
appearing in it, by returning a partition of the species into sets, each 
corresponding to a different component. We assume that the pathway P 
consists of normal-form reactions 11,...,% — P1,---;Pn {C1,---;Cm} in 
which there is a one-to-one correspondence between reactants and products. 

We illustrate the intuitive idea on an example. Each reaction can be seen 
as a synchronisation of components. For example reaction 11,12 + p1, p2 {c} 
can be interpreted as a synchronisation of three components: one that 
changes its state from r, into p;, another component that changes its state 
from rg into pg, and a component which participates passively and stays in 
a state c. Since we suppose that only one reaction takes place at a time in 
the whole system, the states of all the components do not change other than 
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Algorithm 1 Algorithm to partition species into different components 


Let map: S+ Z be a total injective mapping 
for all R in P do 
lét. RE. ca OS Digs sn, (Olas Ga} 
for all j € {1,...,n} do 
cry map(r;) if map(x) = map(p;) 
map := 
rrymap(x) if map(x) A map(p;) 
end for 
end for 
return map 


those involved in the reaction in the way we described. From the example 
we can see that species r; and p; belong to the same component. Similarly 
rg belongs to the component that contains po, while c is from a separate one. 

The algorithm to infer components from a normal-form pathway is 
given in Algorithm 1. It assumes an infinite set of component names Z. 
Initially, each species is assumed to belong to a different component, then the 
algorithm refines this assumption by iterating over the reactions constituting 
P. For each reaction in the pathway, the algorithm updates the mapping by 
unifying the species assigned to the i-th reactant with the 7-th product, for 
alli € {1,...,n}. The result of the algorithm is a mapping map assigning 
each species to its component. 

In the following, we denote by comp(P) the set of components of a 
given pathway P; formally, it is defined as the image of mapping map. Using 
the same notation, we denote by comp(R) = comp({R}) the components of 
a given reaction R. 


Example 4. Let P’ be the pathway defined in Example 3. The set of 
components comp(P’) computed by the algorithm is as follows: 
comp(P’) = { tPis Cp,, PY, Dp:, Zpx} ) {P2, Cp, } ) {M,, My} ’ 
{Mo, Dy; ZM>} ; {E;} ; {E>} } 


Siz components are inferred from reactions. Each component is the set of 
states of a different entity involved in the pathway. 


A component interaction graph can be drawn which visualises the com- 
ponents of a pathway and their interactions. It is a directed graph in which 
vertices are system components (elements of comp(P)) and edges connect 
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components that are involved together in a reaction. If two components are 
both involved as reactants (and consequently products), the edge connecting 
them will not be oriented. If one of the two is involved as reactant and 
the other as catalyst, then the edge will start from the vertex representing 
the latter to the vertex representing the former. There is no edge between 
vertices representing components involved in the same reactions only as 
catalysts. An example of component interaction graph will be shown in 
Section 5.3. 


4 Modular Verification 


In this section we define a modular verification technique for pathway models. 
We proceed by defining the projection of a pathway with the help of the 
identified components. Such a projection can be seen as an abstraction, 
giving rise to abstract pathways. We prove that a successful verification of a 
property in the abstraction implies its truth in the original model. 


4.1 Abstract Pathways: Syntax, Semantics and Fairness 


We are interested in analysing only a portion of the entire pathway, in 
particular a portion induced by only a subset of all components. Let J = 
comp(P) and J C I, we define the projection of a pathway P onto J as 
the abstract pathway P{J. In the following, by abusing the notation, we 
assume function species to be also defined on sets of components J as 
species(J) = Uc, species(x). Moreover, we denote as u[J the projection of 
a set of species u over a set of components J, which is formally defined as 
ul J = un species(J). 


Definition 3. Let P be a pathway. The abstract pathway P{J, obtained 
by abstracting w.r.t. a set of components J C comp(P), is defined as 
Pld =(PRo; PR ARa ARne), where: 

PR.={RE€ P| comp(R) C J, cat(R) 4 0} 

PRne = {RE P| comp(R) C J, cat(R) = 0} 


AR, = U {R}J| comp(R)\J # comp(R), re(R)[ J #0, cat(R) #0} 
ReP 
ARne = U {RJ | comp(R)\J # comp(R), re(R)| J #0, cat(R) = 0} 
ReP 
where R{ J = re(R)|J > pro(R){J {cat(R)} J}. 
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An abstract pathway consists of four sets of reactions, namely PR,, PRy¢, 
AR,, ARnc. If we consider sets PR = PR,U PRny-. and AR = AR, U ARne, 
then reactions are distinguished between unmodified reactions in PR, and 
abstract reactions in AR. In particular, PR contains reactions which deal 
only with components inside J, while AR contains projections of reactions 
that influence components both inside and outside J. On the other hand, 
reactions which affect only components outside of J are excluded from the 
abstract pathway. 

Moreover, both sets of reactions PR and AR are further classified 
between catalysed reactions (PR, and AR,) and non-catalysed reactions 
(PRn-e and AR,-). Note that reactions are classified as catalysed/non- 
catalysed according to their form in the original pathway, rather than their 
resulting form in the abstract pathway. This aspect is actually important 
only for reactions in AR, where projecting a catalysed reaction R over J 
may cause its abstract version to have an empty set of catalysts if catalysts 
are only from components not in J. 


Example 5. Let P’ be the pathway defined in Example 3, and let J be the 
following subset of comp(P’): 


(iP; CPp,, Fi, Dp>, Zpx} ’ {M2,Dmo, ZMo} ’ {E\}} ; 
The projection of P’ onto J, P’|J, is given by the following sets of reactions: 
PR. = {Rs} PRnie = {R3, R3} AR, = { Rg, Rr} ARne = {Ri 


where reactions are defined as follows: 


Pio Cp, (Ri) 

Pi > Pf (Ry) 

ai 2 (R3) 
PY, Mz > Dpx, Du, {Ei} (RZ) 
Dp, Du, > Py, M2 (RG) 
(Rr) 


Dp,Du, > Zp, 2M, {CP,} 


Note that: (i) reaction R', has no corresponding reaction in P'{J; (ii) 
although reactions R§ does not include any catalyst, it is in AR, since the 
corresponding reaction Ri in P’ included one catalyst. 
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RE PRpeUVARne re(R) Cs pro(R) Zs 


a (no-cat) 
S — Ss Upro(R) 


Re PR.UAR. re(R)Cs_ cat(R)Cs 
re(R) £ pro(R) V pro(R) Zs 


g se (s\re(R)) U pro(R) 


(cat) 


RE AR.UARne 8 aL s/ 


: (self-loop) 
S—qS 


VR € PReU PRncU AR. U ARne. 8 


(deadlock) 


SS 


Figure 4: Inference rules of the abstract semantics. 


Definition 4 (Abstract semantics). Let Py = (PRe,PRne, ARc, ARnc) be 
an abstract pathway, and Py = PR~-UPRneU AR. U ARne. The abstract 
semantics of Py is defined as the LTS (P(species(Pa)), Pa U{e}, 4a), where 
+, 1s the least transition relation satisfying the inference rules of Figure 4. 


In the following, the abstract semantic function is denoted LT'Sq, i.e. 
LTS.(Poa) = (P(species(Py)), Pa U {e}, 4a). Note that a pathway is es- 
sentially a special case of abstract pathway, since the semantics of P is 
equivalent (isomorphic) to that of P/comp(P). 

Rules (no-cat), (cat), and (deadlock) are analogous to the rules with 
the same name from semantics of Definition 1. Note that rule (no-cat) deals 
with the application of non-catalysed reactions from the sets PR». and 
ARne, rule (cat) deals with catalysed reactions from PR, and AR,, and 
rule (deadlock) considers any reaction in PRU AR. The abstract semantics 
also provides rule (self-loop) which allows deriving a self-loop transition for 
projected reactions in AR. The purpose of this transition is to model the 
case in which a projected reaction R[J is applicable in the abstract state s 
(i.e. both projected reactants re(R)|J and projected catalysts cat(R)|J are 
present in s), but for which the corresponding unprojected reaction would 
not be applicable in the original state because some reactant/catalysts from 
unobservable components are missing. 


Example 6. Let P’|J be as in Example 5. The fragment of the abstract 
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Re 
€ € s 2 
SY ss <> a P, Dp= J P Ps Dp» 
P\ ME, R2 | P,P{M.E, —— 1 Ry 1 
(sh) -— (si) DypE, -——— Dak 
: 3 “4 (s2) (ss) 
Re 
RY RY Ri RY 
€ € 
Va Var 


Pi Cp, RY P, Pf Cp, Rs Py Cp, Dp; Re Py PT Cp, Dp; 
M2E, = —— Moki Du,E, —) Dmoki 
(s7) (sg) Ri (so) (sio) 


Ry Fe RY 
€ 
f™ 
PiCp, Zp= RY PiPiCp, 
ZMo Ey A Zp* ZMo EA 
(sia) (sis) 


Figure 5: Representation of LTS,(P’|J) rooted at so = {P1, M2, E1}, where 
P’{J is as defined in Example 5. Set denotations (brackets and commas) 
are omitted in state representations, whereas each state is associated with a 
short name s/, where i is as in Figure 2. 


semantics of P'| J, denoted LTS.(P'[J), rooted at {P,, Mz, E\} is shown 
in Figure 6. 


With respect to the semantics of the original pathway P (see Figure 2) 
the abstract semantics there are less states. In the abstract semantics, 
however, there are a few more transitions with label « than in the semantics 
of P. These are in states 89, 84, Sg, 83, Sg and Sq. In those states there 
are enabled reactions from either AR. or ARne (e.g. reaction R{) that have 
been modified by projection and hence depend by some component not in J 
that in principle may be blocked in a state that never allows these reactions 
to occur. The € transition models such a situation. 
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Abstract fairness 


In the case of abstract pathways fairness constraints can be applied only 


to reactions in PR. In fact, note that, given the state s, each applicable 


reaction in AR allows deriving two transitions: (i) s a a (using either 


rules (cat) /(no-cat), for some s’) describing the application of the reaction, 
and (ii) s +, s, (using rule (self-loop)) describing the non-application of 
the reaction. Since each applicable reaction in AR allows deriving these 
two transitions, it might happen that one of the two is always preferred, 
capturing the situation in which the corresponding unprojected reaction is 
either always enabled or always disabled. In other words, this describes the 
case in which some reactants or catalysts from components which have been 
projected out are not present in the original state. 

The formal definition of abstract fairness, which follows, is analogous 
to Definition 2. As we have anticipated, the only difference is that it deals 
only with unmodified reactions included in PR. 


Definition 5 (Abstract fairness). Let Py = (PRe, PRne, ARc, ARnc) be an 
abstract pathway. A path x in LTSq(P.) is fair iff it satisfies the following 
LTL formula: 


= /\\ (GF enabled(R)) > (GF occurs(R)) 
REPRUPRne 
where 
(Arere(R) r) A (A cecat(R) c) if Re PR, re(R) g pro(R 
(Arere(R) r) A (Vrepro(R) —p) A (A cécat(R) c) otherwise; 
occurs(R) = Xp true. 


enabled(R) = 


Example 7. Let us consider the semantics of P'|J described in Example 6 
and shown in Figure 5. According to the definition of abstract fairness we 
have that the following paths are fair: 


/ v7 dt i dt 
7 = So; 384; 5189, 3+ 835 1) 840) 275815, 6 Si5,€ pete 


Ty = So, R3, si, R: Bs So, Ry, 83, € ,83, € , $3, € ’ 
On the other hand, the following paths are not fair: 


/ / Jt / 
73 = So, 3,81, RS, 82, Rg,81, R5,82, Rg,81, R5,82,R6,.-- , 
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1, = 8p, 25,81, Rg, 89; © 89,6 89,6--- - 
Paths 7‘, and 7 are analogous to paths 7, and m2 in Example 2. Path 1 is 
fair since both reactions enabled in sz are not subject to the fairness condition 
since they are in AR. Finally, path x/, is not fair since reaction R}, that is 
enabled in sj, is in PR, and hence the fairness condition applies to it. 


4.2 Logic for Specifying Properties 


Properties of pathways are specified in temporal logic with species as atomic 
propositions. The logic we consider is a fragment of the Computation Tree 
Logic CTL. Following Attie and Emerson [1], we assume the logic ACTL™ for 
specification of properties. ACTL is the “universal fragment” of CTL which 
results from CTL by restricting negation to propositions and eliminating the 
existential path quantifier and ACTL™ is ACTL without the AX modality. 


Definition 6. The syntax of ACTL™ is defined inductively as follows: 


e The constants true and false are formulae. s and 7s are formulae for 
any atomic proposition s, where the set of atomic propositions AP are 
the set of all species S. 


e If f,g are formulae, then so are f \g and f Vg. 
e If f,g are formulae, then so are A|f U g| and Alf Uw gl. 


Given a set of components J, we define the logic ACTL, to be ACTL™ 
where the atomic propositions are drawn from APjz = species(J). We 
define the following abbreviations in ACTL”: AF f = Altrue U f] and 
AG f =Alf Uw false}. 

Properties expressible by ACTL™ formulae represent a significant class 
of properties investigated in the systems biology literature as identified 
in [31], such as properties concerning exclusion (“Jt is not possible for a 
state s to occur”), necessary consequence ( “If a state sy occurs, then it is 
necessarily followed by a state sg”), and necessary persistence (“A state s 
must persist indefinitely”). On the other hand, properties as occurrence, 
possible consequence, sequence and possible persistence are of inherently 
existential nature, and are not expressible in ACTL™. 

We define the semantics of ACTL™ formulae on a generic labelled 
transition system LTS, where each state corresponds to a subset of the 
possible species S', and transitions are labelled by reactions. This requires 
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LTS,s Fg true 
LTS,s #4 false 
LTS,sF4 8 — ses 
LI SS hy 38 <=> sé€s 
LTS,sFg fAg <= CLTS,sF¢ f and LTS,sF4 9g 
LTS,sFgfVg <> CLTS,sF¢f or LTS,sF¢ 9 
LTS,8 Fg Af <=> for all paths a starting from s such that 


T ELTL p it holds LTS, F¢ f 


LIS weg sel g <> T=so,Ro,81,Ri,... 
dk = 0. LTS,s, Fg g and 
WO 7 es LTS, 8; kgf 

LTS,n7Fg fUwg <=> wW=8o0,Ro,81, R1,... 


Vk > 0. if LTS, sx 4g f then 
50 <j < k. LTS, 8; Fog 


Figure 6: Satisfaction relation for ACTL™~ formulae. 


defining the satisfaction relation Fg over both states and paths, where ¢ is an 
LTL formula specifying the fairness constraint. In particular, LTS, f Fg s 
means that f is satisfied by state s of L7S, while LTS, f —4 7 means that 
f is satisfied by path a of LTS. In both cases we consider only fair paths, 
i.e. path satisfying ¢. 


Definition 7. Let LTS = (P(S),P,—) be an LTS, with S being a finite 
set of species and P C reactions(S), and ¢ an LTL formula specifying the 
fairness constraint. The satisfaction relation Fg is inductively defined as in 
Figure 6. 


Example 8. Let us consider LTS,(P’|J) as in Example 6 with initial state 
So. It holds 
LTS.(P'lJ),89 Fo AF (Pi A7=M2) 

since every fair path terminates with a € loop in either 83, S49, OT S15, and 
in all of these states (P| A 4M2) holds. 

By the theorem to be proved in the next section we have that property 
AF (P% A7Mg) also holds in P' from Example 3. Consequently, the property 
also holds in the original pathway P defined in Example 1. 
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4.3. Modular Verification Theorems 


Now we prove that in order to verify an ACTL] property for a pathway 
P and a set of components J, it is enough to verify the same property in 
the abstract semantics of the abstract pathway P|J. The principle behind 
property preservation is that each path in the semantics of the modelled 
pathway must have a corresponding abstract path in the abstract semantics 
of a model obtained by projection. This, combined with the fact that ACTL™ 
properties are universally quantified (namely describe properties that have 
to be satisfied by all paths) ensures that if an ACTL™ property holds in the 
abstract semantics of the projection, then it will also hold in the semantics 
of the original model. In fact, for the components considered in a projection 
the semantics of the original model will contain essentially a subset of the 
paths of the projected model. 

First we define the path projection, which from a path in semantics 
of a pathway with the set of components J removes transitions made by 
components outside of portion J C J and restricts the rest of transitions 
onto J. 


Definition 8. Let 7 = so, Ro, 81, Ri,82,.... 
(So| J, €)°° if Vi > 0. si[J = siga[J 


TMI =) 5 [ J Relat if sk > 0. [J Asepald 
andV0 <i<k. [J =siialJ 


where (s,¢€)~ denotes the path such that (s,€)~° =s,€¢,(s,€)™, and s[J = 
Ss species(J). 


Now we are in the position to present the crucial result, which states 
that a fair path in the semantics of a pathway P is projected into an abstract 
fair path in the abstract semantics of the abstract pathway P| J. It is split 
into four lemmas. Lemma 1 states that if there exists a transition in the 
semantics of P describing the occurrence of a reaction that changes the 
portion of the state specified by J, then a corresponding transition exists 
in the abstract semantics of P|J. Lemma 2 considers fair paths in the 
semantics of a pathway and states that, if from a certain point onwards the 
portion of the state induced by J never changes, then there is a corresponding 
transition labelled by € in the abstract semantics. Lemmas 1 and 2 are used 
to prove Lemma 3, which shows that, given a fair path 7 in the semantics, its 
projection [J according to Definition 8 is a path in the abstract semantics 
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of P{J. Finally, Lemma 4 states that the abstraction of a fair path is also 
fair with respect to the abstract fairness. 


Lemma 1. Let R#«. Ifs *, 5! and s[J #s'[J then s[J A. s‘[ J. 


Proof. The proof is by induction on the inference rules of Definition 1. 
Case (no-cat). We assume transition s Fe su pro(R), such that s[J # 
(sUpro(R))[J. Hence s[ J # s[JUpro(R)| J, which implies pro(R)[J Z s[J 
and pro(R{ J) Z s{J. Moreover, as regards the reactants, re(R) C s implies 
re(R)[ J Cs[J, thus re(R[J) Cs] J. 

In order to show that transition s[J a s’[ J with s’ = s U pro(R) 
can always be derived, we consider the different forms of rule R. 


1. Ifeither (i) comp(R) C J; or (ii) comp(R)\J 4 comp(R) and re(R}J) # 
( then, in both cases, R[J € PRneU ARnc. For the first case, it holds 
that R[J = R. For the second case, note that by the assumed po- 
sitional correspondence between reactants and products and by the 
definition of projection we have that re(R|J) 4 0 iff pro(R[J) 4 0. 
Hence, if re(R[J) was empty we would have also pro(R/J) empty and 
hence s[J =s’[J. Rule premises are already satisfied, thus transition 


s[ J ad s|JU pro(R{J) can be derived, where the target state is 
such that s[ JU pro(R/ J) =s[JU pro(R)[ J = (sU pro(R))[J =s'[J. 


2. The case comp(R)O J = 9 cannot occur, since it would imply that 
pro(R)[J =, which is absurd since pro(R)[J Z s[J. 


Case (cat). Let us assume transition s EL (s\re(R)) U pro(R) can be 
derived by using rule (cat), with s[J 4 ((s\re(R)) U pro(R))[ J. Simi- 
larly to the previous case, premises of the rule imply both re(R[J) C s[J 
and cat(R{J) C s[J. Moreover, (s\re(R) U pro(R))[J = (s\re(R))[ J U 
pro(R)| J = (s[J)\re(RI J) U pro(R{J). Therefore s[J 4 (s|J)\re(R}J) U 
pro(R{ J), which implies that either (i) re(R[J) Z pro(R}J), since re(R[ J) C 
s[J, or (ii) pro(R{ J) Z s[J. 
Finally, we consider the different forms of reaction R. 


1. If either comp(R) C J or comp(R)\J 4 comp(R) then, in both cases, 


RiJ € PR.U AR,, allowing transition s[J “4, (s[J)\re(RIJ) U 
pro(R}J) to be derived, where the target state is s.t. (s[J)\re(R}J) U 
pro(R} J) = (s\re(R) U pro(R))| J. 
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2. The case comp(R)N J = 0 cannot occur, since it would imply both 
re(R)[J = @ and pro(R){J = 0, which is absurd since we assumed 
that either re(R)[J Z pro(R)|J or pro(R)[J Zs[J. 


Lemma 2. Let 7 = so, Ro,81, Ri,...€ LTS(P) be such that t Fypry ®. If 
there exists k > 0 such that Vi > k. si[J = sy41[J then sy [J So sx[ J. 


three possible cases, according to the form of the path 7, as detailed in the 
following. 

Case Vi > k. R; = ¢«. This implies that VR. s; Bale As regards the 
abstract pathway, no reaction in R € PR.U PR,, is applicable in s;[J, 


Proof. Let k > 0 be such that Vi > k. s;[J = sjii[J. We distinguish 


otherwise transition s; Bs! , for some s’, could be derived, contradicting 
the hypothesis. 

Therefore, if there is at least one abstract reaction R[ J € AR,U ARne 
applicable in s;[J (i.e. a transition using either rule (no-cat)/(cat) for R[J 
can be derived) then a corresponding transition s;[J >, s;[J can be also 
derived by using rule (self-loop). On the other hand, if no abstract reaction 
R}J € AR,U ARne is applicable in s;[J, then transition s;[J 3, s;[J can 
be derived by using rule (deadlock). 

Case Vi > k. R; # €. Reactions R;, Vi > k, cannot be used to derive 
transitions in the semantics of the abstract pathway since they do not change 
s;/J. Let us suppose there exists a reaction R/} J € AR.U ARye- which is 
applicable in state s,[J, ie. sx J at s’ for some s’ # s,[J, using either 
rule (no-cat)/(cat). Therefore rule (self-loop) can also be applied, allowing 
transition sy [J o S_{J to be derived. 

On the other hand, let us assume that no reaction R[J € AR,U ARy- 
is applicable in state s,[J, therefore rule (self-loop) cannot be applied. Let 
us suppose, by absurd, that also rule (deadlock) cannot be applied, namely 
AJR € PR.U PRac. Sx[ J Em s’, for some s’ £ sx [ J. Since Vi > k. gif J = 
s,|J, this implies that Vi > k. s;[ J #, 5’, Because reaction R is the same as 


in the original pathway, this also entails that Vi > k. ds”. s; zie s”, namely 
reaction R is enabled infinitely often in path 7. According to the fairness 
constraint, such a reaction R need to occur in the path, i.e. there must exist 
h>k such that sp a Sh41 With sp ~Sp41. Because R € PRU PRae, this 
implies that sp|J 4 Spii{J, which is a contradiction since we assumed that 
Vi > k. si J = Si41 [ J. 

Case JA > k. Wk Si <h. Ri Fe)A(Wi > h. R; = €). Note 
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that, by definition of semantics of pathways, it is not possible to have 
a transition with label R, for some reaction R, after a transition with 
label «. The first case can be applied to subpath 7’ starting from sp, i.e. 
nm = Sh,Sh41,---. This implies that sp[J o sp[J, and thus sx[J Sa sx[J 
since Sx[ J = Sx4i[J =--: = spl J. 


Lemma 3. 7 € LTS(P) with Err, ® implies r/ J € LTS.(P}J) 


Proof. Let 7 = so, Ro, 81, R1,S2,.... The proof is by cases on the definition 
of path projection [J from Definition 8. 

Case Vi > 0. sifJ = si4i[J. According to Lemma 2, transition 
So[J a So[J is present in the abstract semantics, thus (so[J,¢)~° € 
LTS( PhD): 

Case Sk > 0. sx[J A Sxii[J and V0 <i<k. sif[J = siqi[J. Let 


us consider transition sk —>o Sk41; according to the semantics, condition 
sx|/J A S_41 implies that R, # ©. Therefore, it follows from Lemma 1 that 


Sk| J me Skii[J. Let us assume that r*t+![J € LTS,(P{J). It is easy to 
see that, according to Definition 8, the starting state of a**1[J is sx4i[J. 
Therefore, m/J € LTSq(PfJ). 


Lemma 4. 7 € LTS(P) with tFxtry, ® implies [J Erry Pa. 


Proof. Suppose that 7 Fyr, ®, i.e. m Frrt Apep(GF enabled(R)) — 
(GF occurs(R)). Let P[J = (PR, PRne, ARc, ARnc). We want to prove 
that m[J Frrt ®q, that is t[/J Fire Apeprupp,,(GF enabled(R)) > 
(GF occurs(R)). It is easy to see that, for any reaction R from PR.U PRne, 
it holds that, for all i > 0: 


1. r’ Fpry enabled(R) implies x*[J Eyryp enabled(R); 
2. m' Fp occurs(R) implies n'[ J Fprz occurs(R). 


The proof is concluded by noting that, since 7’ is a suffix of 7, then, by 
Definition 8, 7*[J is a suffix of a[J. 


We conclude this section with the property preservation theorem, which 
states that a successful verification of a property in the abstraction implies 
its truth in the original model. 


Theorem 1. For a pathway P, J © comp(P), and f an ACTL] formula 
we have LTS,(P}J),s|J Fo, f implies LTS(P),s Fe f. 
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Proof. By induction on the structure of f. 

f = s. By definition of state projection, since s € species(J), then 
sé€s[Jiffses,ie. LTSy(PlJ),s|J Fe, s iff LTS(P),s Fe s. Analogously 
for f = 7s. 

f= g9Ah. By induction hypothesis. 

f= gVh. By induction hypothesis. 

f =Alg U hj. Given a fair path 7 in LTS(P) starting from s, according 
to Lemmas 3 and 4 it holds that a[J is a fair path in LTS,(P{J). By defini- 
tion of path projection, 7] J starts from s[J. Since DTS,(P/J),s[J Fo, f, it 
holds LTSa(P{J),7|J Fe, gU h. Asaconsequence, 7|J = so, Ro,s1, Ri,... 
and 3k > 0. LTSa(P[J),8« Fo, h and VO <j < k. LTSq(PtJ),8; Fe, 9. 
By definition of path projection, 7 = so’, Ro,s1’, Ri,..., and there exists 
k! > k such that VO <i < kh’. F0< j < k. s/[J = ;, and sp’[J = sx. 
Therefore, LT'S(P),7 Fe g U hand hence LTS(P),s Fa f. 

f = Alg Uw hj. Given a fair path 7 in LTS (P) starting from s, according 
to Lemmas 3 and 4 it holds that 7[J is a fair path in LTS,(P{J). By defini- 
tion of path projection, | J starts from s[J. Since DTS,(P/J),s[J Fo, f, it 
holds LTS,(P/J),a/J Fe, g Uw h. This implies that either LTS,(P[J), 7[ J 
Fe, gU hor LTS,(PIJ), | J Fe, Gg. In the first case the proof is the same 
as for f = Alg U hj. In the second case we have that 7[ J = so, Ro, 81, Ri,... 
and Vk > 0. LTS.(PlJ),sk Fo, g. By definition of path projection, 
nm = So’, Ro,s1', Ri,..., and Vk' > 0. dk > 0. sy’|J = sy. Therefore, 


LTS(P),x Fe G g and hence LTS(P),s Fa f. 


5 Modelling the EGF Receptor-Induced MAP Ki- 
nase Cascade 


We apply our modular verification approach to a well-established compu- 
tational model of the EGF signalling pathway. We consider the model of 
the MAP kinase cascade activated by surface and internalised EGF recep- 
tors, proposed by Schoeber! et al. in [37]. This model includes a detailed 
description of the reactions that involve active EGF receptors and several 
effectors named GAP, ShC, SOS, Grb2, RasGDP/GTP and Raf. Moreover, 
the model describes the activity of internalised receptors, namely receptors 
that are no longer located on the cell membrane, but on a vesicle obtained by 
endocytosis and floating in the cytoplasm. Such internalised receptors con- 
tinue to interact with effectors and to contribute to the pathway functioning, 
but actually the pathway can be seen as composed by two almost identical 
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branches: the first consisting of the reactions stimulated by receptors on 
the cell membrane and the second consisting of reactions stimulated by 
internalised receptors. 

A diagram representing all of the reactions of the pathway considered 
in the model is shown in Figure 7. In the figure, each species is identified 
by a short name, and also by a number (in black) in the interval [1 — 87]. 
The number of species which occur more than once is shown only for one 
of their occurrences. Arrows represent reactions, which are also associated 
with an identifier (in grey), for a total of 102 reactions. The two branches 
of the pathway are partially combined in the figure. In particular, the 
representation of most of the species is combined with the representation 
of its internalised counterpart. In such cases, the number between brackets 
identifies the internalised species. The same holds for reactions: in many 
cases an arrow denotes both a reaction stimulated by receptors in the 
cell membrane and the corresponding reaction stimulated by internalised 
receptors. 

The set of reactions constituting the pathway can be trivially recon- 
structed from the diagram in Figure 7. The only non-trivial aspect is related 
with the presence in the diagram of some reactions in which one reactants 
is actually acting as a catalyst. For instance, this happens in the case of 
the reactions involving Raf* and MEK, in which Raf* initially binds MEK 
and then releases it phosphorylated. We describe these two reactions in the 
diagram with the following single catalysed reaction: 


MEK -+ MEK-P {Raf*} 


Other species acting as catalysts are MEK-PP, Phosphatasel, Phos- 
phatase2 and Phosphatase3. By applying the same transformation also to 
the reactions they are involved in we obtain a pathway constituted by 80 
reactions, which constitutes the starting point for the application of the 
techniques presented in the paper. We call this pathway Pear. 

We recall that fairness requires that a reaction that is infinitely often 
enabled is also infinitely often performed. This prevents starvation situations 
to happen among reactions. In the case of Pegr the two branches of the 
pathway include reactions that could be involved in infinite loops (e.g. the 
reactions involving MEK and ERK). This means that the semantics of the 
pathway includes behaviours in which only one branch executes forever even 
if the other is constantly enabled. Such unrealistic behaviours are excluded 
by the adoption of fairness. 
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Scheme of the EGF receptor-induced MAP kinase cascade [37]. 


Figure 7 


Modular Verification of Qualitative Pathway 
Models with Fairness 105 


RasGDP Raf \ Phosphatase! 


Sos | MEK - Phosphatase2 


| ERK -— Phosphatase3 


Figure 8: Component interaction graph of Prgr. 


5.1 Initial State 


We adopt a semi-automatic heuristic procedure to find an initial state of the 
pathway. The idea is the following: for each species s in species(P), if there 
is no reaction creating it (ie. if s ¢ Upcp pro(R)) then in the initial state s 
is present. This means that species that cannot be produced are assumed 
to be present in the initial state. Otherwise their presence in the model 
would not be meaningful. Subsequently, we resort again to the partitioning 
of species according to components to find other species to be inserted. In 
particular, we find those components containing no species present in the 
previous phase. These components must contain loops, hence we choose 
manually some of their species to insert. All other species are assumed 
absent. 


5.2 The Model 


The model Pegr is made up of 143 species and 80 reactions. It is in the 
correct form assumed in Section 3.1 and no preprocessing is needed. After 
performing the components identification procedure, 14 components are 
identified. On Figure 8 we can see the component interaction graph of Pegr. 
Each node of the graph is labelled by the intuitive name of the component 
that we have chosen. 

Visually, we can do some simple observations on the component interac- 
tion graph. We can identify enzymes like Phosphatasel, Phosphatase2 and 
Phosphatase3. We can see the first part of the pathway corresponding to 
the EGF receptor and its interaction with effectors, and its connection to 
the MAP kinase cascade through the component RasGDP. 


106 P. Drabik, A. Maggiolo-Schettini, P. Milazzo, G. Pardini 


5.3 Experiments 


In this section we exploit the NuSMV model checker to perform some 
experiments on the model of the EGF pathway. NuSMV includes model 
checking algorithms that allow fairness constraints to be taken into account. 
We rely on such algorithms to manage fairness constraints introduced in 
this paper. Moreover, in order to carry out the projection and encode the 
resulting abstract pathway in the NuSMV format we have developed a tool 
(available upon request). 

The first experiment is aimed at showing how modular verification could 
be applied to verify a global property of the pathway, namely that the final 
product of the pathway is always produced. This can be done in a modular 
way by proving sub-properties in three different model fragments obtained 
by projection. 

Subsequently, a number of experiments are performed with the aim of 
showing how the molecular components we identified in the pathway can be 
used to better understand the pathway dynamics. In particular, we check 
whether there are some molecular components that are not really necessary 
to obtain the final product of the pathway. This will be done by applying 
model checking on models in which molecular components are selectively 
disabled by setting their initial states to false. Also in this case the modular 
verification approach is adopted. 

In this case study modular verification allows properties to be verified 
faster than on the complete model. However, modular verification is still not 
significantly more efficient than verification on the complete model. This is 
due to the projection operation we are considering at the moment, which 
is rather rough. In Section 6 we discuss why this modular verification is 
a promising approach for the analysis of pathways, and how we plan to 
improve the approach to make it substantially more efficient. 

To run the experiment we used NuSMV 2.5.4 on a workstation equipped 
with an Intel i5 CPU 2.80 Ghz, with 8GB RAM and running Ubuntu 
GNU/Linux. In order to make verification faster NuSMV was executed 
in batch mode by enabling dynamic reordering of BDD variables and by 
disabling the generation of counterexamples. 


5.3.1 Modular Verification of a Global Property 


The final product of the MAP kinase cascade activated by surface and 
internalised EGF receptors is species ERK-PP. Since surface and internalised 
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receptors activate two different branches of the pathway, we denote by 

ERK-PP the product of the branch activated by the surface receptors and by 

ERK-PPi the product of the branch activated by the internalised receptors. 
The property to be verified is 


AF(ERK-PP V ERK-PPi) (1) 


The property holds in the complete model and its verification required 
260 seconds. By looking at the diagram in Figure 7 we noticed that the 
pathway could be partitioned in three parts, with two species acting as 
“gates”. These two species are (EGF-EGFR*)2-GAP and Raf*. Hence, we 
decided to try to apply modular verification by splitting property 1 into the 
following three sub-properties: 


AF ((EGF-EGFR*)2-GAP) (2) 
AG((EGF-EGFR*)2-GAP - (AF Raf*)) (3) 
AG(Raf* + AF(ERK-PP Vv ERK-PPi)) (4) 


Property (2) states that in all paths of the system a state in which species 
(EGF-EGFR*)2-GAP is present is eventually reached. Property (3) states 
that whenever a state is reached in which species (EGF-EGFR*)2-GAP is 
present, then a state in which Raf* is present is eventually reached. Finally, 
property (4) states that whenever a state is reached in which species Raf* 
is present, then a state in which either ERK-PP or ERK-PPi is present is 
eventually reached. It is easy to see that the conjunction of (2), (3) and (4) 
implies (1). 

We considered three projections of the complete model to be used 
to verify properties (2), (3) and (4), respectively. In particular, from the 
component interaction graph of the model (shown in Figure 8) we extracted 
the following subsets to be used for projections: 


e in order to verify (2) we considered the subset J; consisting of compo- 
nents EGF, EGFi, EGFR and GAP; 


e in order to verify (3) we considered the subset J2 consisting of compo- 
nents EGFR, GAP, Shc, RasGDP, Grb2 and Sos; 


e in order to verify (4) we considered the subset J3 consisting of compo- 
nents RasGDP, Raf, MEK, ERK, Phosphatase1, Phosphatase2 and 
Phosphatase8 . 
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We obtained that (2), (3) and (4) hold in the abstract semantics of the 
abstract pathways P/J;, P[Jo and P/J3, respectively. Moreover, model 
checking required less than three seconds for (2), 213 seconds for (3) and less 
than one second for (4). Overall, modular verification required 217 seconds, 
that is 43 seconds less than verification on the complete model. Note that 
the time needed to perform pathway projection over a set of components is 
negligible. 


5.3.2 Reasoning on Molecular Components 


As it can be seen in the component interaction graph and in the diagram in 
Figure 7, some molecular components are involved in complex interactions. 
This is true in particular for components EGFR, GAP, RasGDP, Sos, She 
and Grb2 which form a clique in the component interaction graph. We 
are interested in understanding whether all of these components are really 
necessary in order to obtain the final products of the pathway. The idea 
is to test whether the final species are produced when the components of 
interest are assumed one by one as disabled. Molecular components EGFR 
and RasGDP are for sure necessary since they connect the clique with the 
other molecular components of the pathway. Consequently, we focus our 
analysis on GAP, Sos, Shc and Grb2. 

In order to disable a molecular component we consider as absent all of 
its species in the initial state of the systems. Hence, we consider a set of 
four (complete) models, each with one of the four components under study 
disabled. On each model we try to verify property (1): if the property does 
not hold, then the component that is disabled in such a model is necessary for 
the pathway; on the other hand, if the property holds, then the component 
turns out to be not necessary since the products of the pathway can be 
obtained even without it. The same tests can be also done in a modular way 
by decomposing the pathway and the property as in Section 5.3.1. 

In Table 1 we summarise the property verification results and compare 
verification times obtained by model checking the complete models and by 
following the modular approach. The first row of data in the table reports 
verification results in which no component is disabled (as in Section 5.3.1). 
The other results show that Shc is not a necessary component, whereas all of 
the other three are. As previously, the time required by modular verification 
is smaller than the one required by model checking the complete model. This 
is true in particular in the case in which GAP is disabled since property (2), 
the verification of which is very fast, turns out to be false. 
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Note that in the case of modular verification of the models in which 
GAP, Sos and Grb2 were disabled we needed to verify some additional 
properties. In particular, in the case of GAP we have that property (2) 
does not hold in the abstract semantics of P[J,, and in the cases of Sos 
and Grb2 property (3) does not hold in the abstract semantics of P[J. 
We remark that our modular verification approach guarantees only that 
properties proved to hold in a model fragment also hold in the complete 
model. Nothing can be said, instead, of properties that does not hold in 
the model fragments. In order to avoid applying model checking on the 
complete model to check whether these properties hold there, we consider 
some new properties whose satisfaction in suitable model fragments implies 
that properties (2) and (3) actually do not hold. In order to prove that (2) 
is actually false when GAP is disabled we consider the following property: 


AG(=(EGF-EGFR*)2-GAP) (5) 


In order to prove that (3) is actually false when either Sos or Grb2 is disabled 
we consider the following property: 


AG(=Raf*) (6) 


Note that it is convenient to verify properties (5) and (6) together with 
(2) and (3), respectively. This avoids spending twice the time needed by 
the model checker to construct the data structure necessary to perform 
the verification. In the case of our experiments the construction of such 
data structures takes usually the 98%-99% of the verification time. Times 
reported in Table 1 are based on this optimisation. 


6 Discussion and Conclusions 


In this paper we presented preliminary results in the development of a 
modular verification framework for biochemical pathways. We defined a 
modelling notation for pathways associated with a formal semantics and a 
notion of fairness that allows the dynamics to be accurately described by 
avoiding starvation situations among reactions. Moreover, we investigated a 
notion of molecular component of a pathway and we provided a methodology 
to infer molecular components from pathways the reactions of which satisfy 
some assumptions. Molecular components were then used by a projection 
operation that allows abstract pathways modelling an over-approximation 
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of the behaviour of a group of components to be obtained from a pathway 
model. The fact that a property expressed by means of the ACTL™ logic 
holds in an abstract pathway was shown to imply that they hold also in the 
complete pathway model. This preservation is at the basis of the modular 
verification approach which was demonstrated on a well-established model 
of the EGF pathway. 


The results of experiments given in Section 5.3 show that our modular 
verification approach allows properties to be verified in a shorter time than 
in the case of verification of the complete pathway model. However, in most 
of the cases the time saved was relatively small (~ 15%). We believe that 
the cause of this limited gain in efficiency is due to the projection operation 
we are considering at the moment, which is still somewhat rough. Our plan 
to improve efficiency is to improve the projection operation by including in 
it minimisation of components. In particular, projection could be used to 
abstract from some of the model components as it happens now, but also 
to minimise the remaining components. Minimisation should be aimed at 
removing from the model all the reactions describing internal changes of 
the considered components. Indeed, removing internal changes do not affect 
satisfiability of properties (if the state reached after an internal change is 
actually not mentioned in the considered property). Minimisation can be 
obtained by translating a component and its reactions into a finite state 
automaton, and then by applying a standard minimisation algorithm on 
it. Minimisation could also be aimed at replacing sequences of reactions 
with single reactions when they involve more than one component but do 
not present any branching opportunity in between the sequence (i.e. no 
other reaction is applicable to the same species in between the sequence). 
These minimisation operations would allow, for example, to reduce the size 
of the model of the components constituting the clique in the component 
interaction graph in Figure 8. Indeed, most of the reactions involving GAP, 
Shc, Sos and Grb2 could be probably removed from the model. This would 
allow for a significant improvement in modular verification efficiency. 


In the case study, the choice of splitting the global property into three 
sub-properties has been crucial since it allowed our modular verification 
methodology to be successfully applied. In general, finding a set of sub- 
properties suitable for modular verification when possible can be a difficult 
task. Most real pathways, however, when observed at a very abstract level, 
show a sequential dynamics in which from initial substrates a sequence of 
intermediate species are obtained until the final product is produced. During 
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this roughly sequential process there might be some components involved 
only at the beginning, others only in intermediate stages, and others only at 
the end. The component interaction graph can help the modeller to identify 
which molecular components are involved in which stage, and consequently 
which species are “checkpoints” between subsequent stages. In turn, this can 
suggest how to split the global property into a set of sub-properties in a way 
similar to the one we followed, namely by testing with each sub-property the 
reachability of states in which checkpoint species for stage n are present by 
assuming that checkpoint species for stage n — 1 have already been produced. 
We believe that this approach could also be automatised, although it will not 
be successful for all pathways and for all properties. Indeed, this approach 
is actually an instantiation of the assume-guarantee paradigm for which 
automatisations have been proposed [7]. 

Moreover, we believe that with more complex and realistic pathways 
splitting properties into sub-properties will be less needed since often the 
properties will not be as “global” as in our case study, and there will probably 
be many components whose contribution to the satisfaction of such properties 
is limited. In addition, the definition of a finer projection operation could 
allow property splitting to become less and less necessary. 
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